Skip to content

Ignore UTF-8 surrogate characters when encoding request content#3655

Closed
nguyenhoan1988 wants to merge 3 commits intoencode:masterfrom
nguyenhoan1988:master
Closed

Ignore UTF-8 surrogate characters when encoding request content#3655
nguyenhoan1988 wants to merge 3 commits intoencode:masterfrom
nguyenhoan1988:master

Conversation

@nguyenhoan1988
Copy link

Summary

  • Add tests asserting surrogate code points are dropped from request bodies and JSON.
  • Change all internal UTF-8 encodings to use errors="ignore" to avoid "surrogates not allowed" Unicode errors.

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

@cclauss
Copy link
Contributor

cclauss commented Sep 4, 2025

Please rebase to pick up the fixes merged in:

@lovelydinosaur
Copy link
Contributor

I understand that this PR may be closed in case there was no previous discussion.

Thank you, no. I wouldn't consider errors='ignore' an improvement here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants